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METHOD AND APPARATUS FOR ADAPTIVELY SELECTING A NUMBER OF 
REFERENCE PICTURES FOR ENCODERS 

BACKGROUND OF THE INVENTION 

5 Recent video coding standards and architectures employ multiple reference 

pictures for motion estimation and compensation in an attempt to improve coding 
efficiency. On the other hand, use of multiple references can increase considerably 
the complexity of the encoder, since more pictures need to be examined during the 
motion estimation process. Furthermore, since the reference index needs to also be 

10 included within the bitstream, (e.g - for every macroblock or macroblock partition; 
blocks of 16x8, 8x16 or 8x8 pixels), this may imply that it is not always certain that 
multiple references would provide benefits in encoding a particular picture (i.e. - a 
picture may be biased towards only a single reference). Considering for example 
that for each macroblock in H.264 it is possible to transmit up to 4 reference indices 

15 for Predictive (P) pictures, and 8 for Bi-directionally (B) predictive pictures, the bitrate 
overhead due to the reference indices could be quite significant. It would be 
desirable instead to be able to decide the number of references prior to encoding a 
given picture, considering that if only one reference is used the above mentioned 
overhead would be eliminated thus possibly improving encoding performance, while 

20 at the same time reducing complexity since fewer references would be tested using 
motion estimation. In H.264 for example, the number of references is controlled 
through the num_ref JdxJALactive_minus1 (N is equal to 0 for listO and 1 for listl ) 
parameter that is signaled t the slice level. If this parameter is equal to 0, then for 
the current slice, no other reference index information needs to be transmitted for 

25 that list. 

In previous standards and architectures like MPEG-2 and MPEG-4 only a 
single reference index was used, while for the encoding of motion vectors a special 
code was also transmitted within the bitstream for every picture named as the f-code 
parameter that was used for the determination and decoding of the motion vectors. 
30 Essentially this parameter was derived during the motion estimation process, and 
affected the VLC coding of the motion vectors. Previous proposals for automatically 
adapting the f_code for every picture, depending on its motion parameters and 
range, could achieve better coding efficiency, compared to keeping the parameter 
fixed. Nevertheless, newer standards such as H.264, do not support this. parameter, 
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essentially use predefined VLC codes for the encoding of the motion vectors, and 
thus no such property could be utilized. On the other hand, H.264 considers multiple 
references which require a similar parameter to f_code to be transmitted, but no 
equivalent work has been done up to this point. 

5 

SUMMARY OF THE INVENTION 

These and other drawbacks and disadvantages of the prior art are addressed 
by an apparatus and method that enables a video encoder to achieve similar or even 
better quality, with also lower complexity, by adaptively selecting the number of 
10 reference pictures that are used during the encoding process. The decision can be 
based on previously generated information, such as picture correlation, reference 
picture motion vectors, residuals, etc, while also this decision could be based on a 
Rate Distortion Optimal method. 



15 DETAILED DESCRIPTION 

In accordance with the principles of the present invention, a new method is 
presented for deciding the number of references that will be used for the encoding of 
the current picture. 

Obviously a relatively simple method for selecting the number of references 
20 for a given picture would be, similar to the method performed for Lcode in MPEG-2 
and MPEG-4, to encode the picture in a first pass using all possible references, and 
finally in a second pass recode the picture using only the referenced pictures. An 
additional consideration could be made on whether the number of macroblocks or 
blocks that reference a given picture satisfy a given condition/threshold. If this 
25 condition is not satisfied, this reference is also removed from the reference buffer, 
and these macroblocks/blocks are then predicted from the remaining references. 
Although such methods could potentially lead to better encoding performance, they 
also introduce considerably higher complexity considering that a picture needs to be 
coded twice. This is especially burdensome in codecs such as H.264, due to their 
30 already very high complexity. Nevertheless, in a more brute force approach, it is 
possible also to try and encode the same picture K times, using from 1 to M 
references where K is equal to: 

..fi(M-i)! 
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which basically implies all possible arrangements and combinations (permutations) 
of references, including reordering. From these, we can select the one that gives the 
least distortion, or bitrate, or use rate distortion optimization criteria (e.g. through the 
use of lagrangian multipliers in the form of J=D+AxR). 

5 

It is possible to design a much simpler approach for performing the decision of 
the number of references, without sacrificing much in encoding quality or bitrate. In 
particular, we have found that high correlation exists in the indices of the references 
used between adjacent pictures. Such correlation increases further when the two 

10 pictures are of high similarity (e.g. their absolute difference is below a given relatively 
small threshold). For example, if the immediate previously coded picture, at time t-1 , 
references only picture at t-2, has little motion, and is very similar with the current 
picture at time t (e.g. picture MAD <4) this suggests that it is also very likely that the 
current reference would be using a single reference; An additional simple 

15 comparison (e.g. absolute difference) between the current picture, and the remaining 
references could also be performed to further enhance this decision. Finally, as an 
additional rule, the motion vectors and reference indices not only of the closest but 
also of all other references could be considered for this decision. 

20 A similar concept could also be applied to B pictures. In this case, considering 

that usually B pictures are contained within a forward (listO) and a backward (listl) 
reference we may also add an additional condition depending on the motion vectors 
and reference indices of both references. For example, it is very likely that if all or a 
very high percentage of the blocks (i.e. 90%) in the backward reference picture, use 

25 the first forward picture as reference, then using only a single reference for listO can 
be more beneficial considering the bits saved from not having to code the reference 
indices. In reality, from experimental results, B pictures do not benefit as much as P 
pictures from multiple references, considering also the high use of skip modes within 
this picture type, and this option could be completely disabled without having to 

30 perform a reference number decision, and without much impact in quality. 

A more specific and rather simple strategy for selecting the number of 
references, in an exemplary embodiment, is follows: 
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• Compute the Sum of absolute differences (SAD) at the block or 
macroblock level between the current picture and the first reference in the given list 
(listO or listl). This difference could be computed using either the original of the 
reference, or the reconstructed reference. Compute also the mean absolute 

5 difference {MAD) for the entire picture. If the MAD for the entire picture is below a 
relatively small threshold Ti then only one reference is used for that list, and 
num^ref JdxJALactive_minus1 is set to 0. If also all or a high percentage greater 
or equal to R% (e.g. fl%=95%) of the macroblocks have SAD value below a 
threshold T 2 then again a single reference is used, and 

10 num_ref JdxJNLactlve_minus1 is again set to 0. If the reconstructed reference is 
used for the distortion calculation, considering that this is also affected by the 
quantization process, Ti and T 2 should be adjusted/scaled accordingly. The simplest 
strategy would be to predefine specific weights that depend on the quantization QP 
parameters and select Ti and T 2 as Ti(QF)=a(Of 3 )xTi and T 2 (GF)=b(Q^xT 2 where 

IS a() and b{) are the predefined weights. 

• If the above rule does not apply, but the MAD of the entire picture satisfies 
a different threshold T 3 (Ti < MAD < T 3 ) or the SAD for all (or a high percentage of) 
macroblocks satisfies a different threshold T 4 (T 2 < SAD < T 4 ) then we also examine 
the motion vectors and reference indices used by the first reference of each 

20 corresponding list. If all (or a high percentage of e.g. reference indices used 
are equal to zero, then again only a single reference is used, and 
num_ref _idxJALactive_minus1 is modified accordingly. Special consideration 
could also be made on the motion vectors, i.e. if all motion vectors are small enough 
(low motion activity) to enhance this condition. For example, if a large percentage 

25 (e.g. K 2 %<K 1 %) of the blocks in the picture use the zero reference and at the same 
time have motion vector components MV X and MV y lying in the range of [-mxi, mx 2 ] 
and [myi, my 2 ] (e.g mxi=mx 2 =myi= my 2 =1) respectively, a single reference is used. 

• Otherwise, the remaining references are also compared to the current 
reference, through the calculation of the entire picture or Block/Macroblock SAD 

30 values. If the MAD X for reference / is above a threshold T 5 or all macroblocks have a 
value of SADi larger than a threshold T 6 , then this reference is removed from the 
examined references. Similar to the previous conditions, motion vectors and 
reference indices from the closest to further reference can be considered and assist 
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in the decision, by also adapting the values of T 5 and T 6 . In particular, if a reference 
is not used by another reference that is closer to the current picture, then these 
thresholds are reduced (reduction implies that reference has higher probability that it 
is removed from the references examined). 
5 • Since also the distortion values for a reference compared to its own 

references might have already been pre-computed for that picture's reference 
number decision, these distortion values could be reused as an additional decision 
mechanism. In particular, if it is already known that a reference is very similar to the 
current picture, but has high distortion compared to another reference, then it is very 
10 likely that even the current picture will have high distortion versus that reference and 
the distortion calculation could be completely avoided, and that reference is 
automatically removed from the references that will be used. On the other hand, 
considering the distortion (or the residual if available) between these two references 
after motion compensation, would probably lead to a better decision and 
15 performance. 

• Finally, it is also possible using these generated statistics to perform a 

reordering of the references (references with smaller distortion are placed with 
higher priority in the list), i.e. in H.264 by signaling the reference picture list 
reordering elements (section 7.3.3.1 Error! Reference source not found.). 
20 Obviously other methods for estimating distortion could also be used, while this 
method could be combined with weighted prediction strategies. 

For B pictures a similar strategy could be followed. On the other hand, as previously 
discussed, it is possible to also use both lists for deciding whether a reference will be 

25 kept or not. In particular if the listl of a B picture (e.g. P 9 for pictures By and B 8 ) uses 
only a single reference which is also the first reference in HstO (P 6 ) and there is a 
temporal relationship between these pictures as can be seen in Figure 1 , then it is 
very likely that these B pictures would also be using a single reference for listO. We 
may again consider the distortion of these references, but also the motion 

30 information, and in particular if most blocks in the listl reference being stationary or 
not (having zero or close to zero motion). If the listO reference also uses a single 
reference entirely or in its majority, this rule could be strengthened further, while also 
motion vectors, and distortion between each reference could again be considered. 
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VARIOUS EMBODIMENTS. ADVANTAGES. ETC. : 

1 . Perform motion estimation and compensation examining all possible 

combinations and re-orderings of available references and select the one that 
minimizes a predetermined condition (rate, distortion, or combination thereof) 

5 2. Perform motion estimation and compensation examining all references, and 
recoding if only a single reference was used 

3. Perform motion estimation and compensation using only a single or a subset 
of the number of references for a list originally specified in the encoder by 
examining whether certain criteria are met. 

10 4. As described in number 3 where if only a single or fewer than originally 

specified references are used, the num_ref_idxJ/V_active_minus1 is reduced 
accordingly. 

5. As described in numbers 3 and/or 4, where the distortion between current and 
first reference picture is used for determining whether a single reference is to be 

IS used. 

6. As described in number 3 where motion information and references, are used 
for the decision. 

7. As described in number 6 where additional references are also examined with 
regards to distortion and motion information, and removed if they do not satisfy 

20 some given criteria. 

8. As described in numbers 3 and/or 4, where motion information/distortion from 
a different list reference could also be used for determining the number of 
references that will be used for prediction. 
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Figure 1: Consideration of containing P references (P« and P 9 ) for the determination of number 

of references for B 7 and B 8 
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